optimal Camera Placement to measure Distances Conservativly Regarding 

Static and Dynamic Obstacles 



M. Hanel and S. Kuhn and D. Henrich 
Angewandte Informatik m 

Universitat Bayreuth 
95440 Bayreuth, Germany 

{maria . haenel, Stefan . kuhn, 
dominik . henrich}@uni-bayreuth . de 

Abstract — In modem production facilities industrial robots 
and humans are supposed to interact sharing a common 
working area. In order to avoid collisions, the distances 
between objects need to be measured conservatively which 
can be done by a camera network. To estimate the acquired 
distance, unmodelled objects, e.g., an interacting human, need 
to be modelled and distinguished from premodelled objects 
like workbenches or robots by image processing such as the 
background subtraction method. 

The quality of such an approach massively depends on the 
settings of the camera network, that is the positions and orien- 
tations of the individual cameras. Of particular interest in this 
context is the minimization of the error of the distance using the 
objects modelled by the background subtraction method instead 
of the real objects. Here, we show how this minimization can be 
formulated as an abstract optimization problem. Moreover, we 
state various aspects on the implementation as well as reasons 
for the selection of a suitable optimization method, analyze the 
complexity of the proposed method and present a basic version 
used for extensive experiments. 

Index Terms — Closed range photogrammetry, optimization, 
camera network, camera placement, error minimization 

I. Introduction 

Nowadays, human/machine interaction is no longer re- 
stricted to humans programming machines and operating 
them from outside their working range. Instead, one tries 
to increase the efficiency of such a cooperation by allowing 
both actors to share the same working area. In such a context, 
safety precautions need to be imposed to avoid colUsions, 
i.e., the distance between human and machine interacting in a 
common area needs to be reconstructed continuously in order 
to detect critical situations. To this end, usually a network 
of cameras is installed to, e.g., ensure that every corner 
of the room can be watched, every trail can be followed 
or every object can be reconstructed correctly. Within this 
work, we focus on computing an optimal configuration of 
the camera network in order to measure the distances as 
correct as possible but still conservatively. 

After a brief review on previous results concerning the 
predescribed distance measurement, we show how an unmo- 
delled (human) object can be contoured by a 3D background 
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subtraction method. We extend this scheme to cover both 
static and dynamic obstacles, some of which are modelled in 
advance but still occlude the vision of the sensor. In Section 
in, we rigorously formulate the problem of minimizing the 
error made by using the associated model instead of the 
original collective of unmodelled objects. Considering the 
implementation of a solution method, we discuss various 
difficulties such as, e.g., the evaluation of the intersection of 
the cones corresponding to each camera in Section IV and 
also give an outline of concepts to work these issues. In the 
final Sections V and VI, we analyze the complexity of our 
basic implementation by a series of numerical experiments 
and conclude the article by given an outlook on methods to 
further improve the proposed method. 

II. State of the art 

Many camera placement methods have to deal with a 
trade-off between the quality of observations and the quantity 
of pieces of information which are captured by the cameras. 
The latter aspect is important for camera networks which ha- 
ve to decide whether an item or an action has been observed. 
There have been investigations about how to position and 
orientate cameras subject to observing a maximal number of 
surfaces [8] and different courses of action [1,5,6] as well as 
maximizing the volume of the surveillance aread |14| or the 
number of objects [11]-[13]. Another cotmnon goal in this 
context is to be able to observe all items of a given set but 
minimize the amount of cameras in addition to obtain their 
positions and orientations [4,11]-[13]. This issue is called 
"Art Gallery Problem" especially when speaking of two- 
dimensional space. 

Apart from deciding whether an object has been detected 
by a camera network, another task is to obtain detailed 
geometrical data of the observed item like its position and 
measurements of its comers, curves, surfaces, objects etc. As 
described in [10] determining this information for distances 
smaller than a few hundred meters by cameras belongs to the 
field 'close range photogrammetry'. In order to configure 
a camera network to cope with such tasks, one usually 
minimizes the error of observed and reconstructed items. 
Often the phrase 'Photogrammetric Network Design' is used 
to express minimizing the reconstruction error for several 
(three-dimensional) points. The default assumption in this 



context, however, is that no occlusions occur, cf. [7,16]-[19] 
for details. Optimally localizing an entire object which is not 
occluded is an assignment treated in [3]. Furthermore, many 
approaches compensate for the increasing complexity of the 
problem by oversimplifying matters: One common approach 
is to restrain the amount of cameras (in [7] two cameras 
are used) or their position and orientation. Considering the 
latter, known approaches are the viewing sphere model given 
in [18,19] or the idea of situating all cameras on a plane and 
orientating them horizontally, cf. [3]). 

In contrast to these approaches, we discuss optimizing 
positions and orientations of cameras in a network in the 
context of the background subtraction method which is used 
to determine a visual hull of a solid object. By means of this 
visual hull, distances can be computed easily which renders 
this approach to be a different simplification. Occlusions of 
solids to be reconstructed obscure the view and enlarge the 
visual hull. In order to get the minimal error of the construc- 
tion of the hull, [23] assumes that minimizing the occuring 
occlusions of solids also reduces and thus specifies their 
possibile locations. However, neither obstacles nor opening 
angles other than tt are discussed in [23] and additionally the 
orientation of the camera is neglected as a variable since it is 
simply orientated towards the object. In [2], static obstacles 
are considered but the amount of cameras is chosen out of 
a preinstalled set. 

Since we are not interested in optimizing the quantity of 
observed objects but the quality of data, our approach is dif- 
ferent to most of the discussed results. Note that the quality 
of information can be obtained by various types of image 
processing. Here, we consider the background subtraction 
method to obtain a visual hull of a given object. Within our 
approach, we optimize the positions and orientations of a 
fixed amount of cameras as to minimize the error that is 
made by evaluating distances to the visual hull. In contrast 
to existing results, our goal is to incorporate the aspects of 
occuring static or dynamic obstacles into our calculations 
but also to exploit all degrees of freedom available in an 
unconstrained camera network. Nevertheless, distances are 
to be evaluated conservativly. 

III. Visibility Analysis 

Within this section, it will be shown how to condition the 
objective function on the cameras position and orientation, 
thus, we successively build up a mathematical representation 
of the optimization problem. We start off by defining the 
critical area as well as the to be reconstructed unmodelled 
objects and corresponding abstract models in Section III-A. 
This will allow us to formally state the objective function 
which we aim to minimize. In the following Section III-B, 
we define the camera network and its degrees of freedom, i.e. 
the position and orientation of each camera. These degrees 
of freedom allow us to parametrize the model of the to 
be reconstructed object. Additionally, this tuple of degrees 
of freedom will serve as an optimization variable in the 
minimization problem stated in Section III-C. To cover all 
possible scenarios, this problem is extended by incorporating 
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both static and dynamic obstacles as well as an evolving time 
component. 

A. Formalizing the Problem 

Let U C M'^ be a spacial area based on which information 
about humans, perils, obstacles and also cameras can be 
given. Consider § C U to be the surveillance area, where 
critical points of the set C C § as well as objects, such a 
human or a robot, are monitored. 

For the moment, we neglect obstacles completely, we 
just distinguish two types of objects, to explain the basic 
idea of reconstructing an object by the means of a camera 
network: If a detailed model of an object exists describing 
its appearance like location, shape, color or else, the object 
is called modelled. If this is not the case the object is called 
unmodelled. This is motivated by the following scenario: If 
humans move unpredictably within the surveillance area, i.e. 
without a given route, their appearance is unmodelled and 
needs to be reconstructed to be used for further calculations. 
The model of an unmodelled object can be reconstructed by 
the means of a camera network. Therefore, let 0„ (a) C S be 
a complete set of points included in one or more unmodelled 
objects, depending on the appearance of unmodelled objects 
specified by the parameter a e M.^ . We refer to these 
objects as unmodelled collective. Since automaticly placing 
the cameras for such a scenario is incomputable without 
information on the unmodelled collective, we impose the 
assumption that the distribution : 2^ [0, 1] of the 
appearance a g M'^ is known. 

As the safety of a human being must be guaranteed in any 
case, the distance 

d(C, Ou{a)) := min{d(a;, | a; G C, y G 0„(a)} 

has to be computed conservatively and security measures 
need to be taken if the unmodelled collective Ou(a) appoa- 
ches the critical points C. Here d{-,-) denotes a standard 
distance function. If the exact set Ou(a) was known, this 
distance could be evaluated easily. As we do not directly 
know the value of a e M.^ and therefore can only guess the 
points that are included in Ou{a), we need to approximate a 
(as a consequence also conservative) model M{a) C S, see 
Fig. 1. 




Fig. 1. Surveillance area S: Distance between critical points C 
and unmodelled collective (black) and distance to the approxi- 
mated model (green) 
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Note that for now M{a) is an abstract approximation of 
Ou{a) with respect to the parameter a only. In order to 
actually compute M{a), a sensor network and its degrees 
of freedom come into play, see Section III-B for details. 
Still, the abstract approximation allows us to formalize our 
overall task, i.e. to minimize the difference between the 
approximation based distance d{C,M{a)) and the real di- 
stance d{C, Ou{a))). Taking the assumed distribution of the 
parameter a into account, we aim to minimize the functional 



die, 0„(a)) - die, Mia)) dVia). 



(1) 



Note that for the optimization we need to be aware of 
possible appearances of the object in order to let the integral 
pass through their space a e M*'. Thus aU appearances a e 
K'^ of the unmodeUed collective should be known. 

B. Building a Model with the Camera Network 

In the previous section we saw that in order to evaluate the 
functional (1), a model M(a) of the unmodelled collective 
Quia) C U is required. To obtain such a model, we impose 
a camera network M consisting of n e N Cameras. Each 
camera can be placed and orientated with a setting E = (U x 
[— TT, tt] X [— f , f ]). Here, the first term corresponds to the 
position of the camera whereas the second and third denote 
the angles 'yaw' and 'pitch' respectively. For simplicity of 
exposition, we exclusively considered circular cones in our 
implementation which allowed us to neglect the angle 'roll' 
as a degree of freedom in the setting of a single camera. 
Hence, each camera exhibits five degrees of freedom, three 
for the position and two for its orientation. 

Thus, each camera can be regarded as a tupel (e,p) € 
E X U whereas its produced output regarding the parameter 
a e K*^ of urmiodelled collective is a function 

(ExUxK'^) ^- V 

(e, p, a) ^ Ke,aip) 

that is - given the setting e E E and the appearance of the 
unmodelled collective a e M*^ - each point p € U is mapped 
onto a sensor value v gY where 

V := {free, occupied, undetectable}. 

This set is adjusted to the evaluation of the network's 
images by the change detection method (e.g. background 
subtraction). The sensor value K,e,a{p) of a point p £ U is 
free if this point is perceived as not part of the unmodelled 
collective. The value occupied resembles the possibility that 
the point could be part of the unmodelled collective (i.e. 
the point might be occupied by the collective). If the sensor 
cannot make the decision, e.g. this is the case for cameras 
that cannot 'see' behind walls, the value is undetectable. 
Obstacles like walls will be discussed in Section III-C. To 
obtain the values of set V one could apply the method of 
background subtraction, which is discussed in [9] elaborately. 
Although our method is not restricted to a pixel model which 
is considered in [9], the idea of this work remains the same. 



Thus, we will only provide the prior formuUzation of the 
values, as to explain their role in building the model of an 
unmodelled collective. 

According to the definition of the set V, all cameras split 
the set U into three different subsets: 

Pf(e,a) = {m e U I Ke,aiu) = 'free'} 
Poc(e,a) = {u e U I K,e,aiu) = 'occupicd'} 
Pnd(e,a) = {u e U I Ke,aiu) = 'imdctectable' } 

We state here without proof that we have constructed these 
parts to be a pairwise disjoint conjunction of U, i.e. 

U = Pf(e, a) U Poc(e, a) U Pnd(e, a) 

with Pf(e, a) nPoc(e, a) = Pf(e, a) nP„d(e, a) = Poc(e, a) n 
P„d(e,a) = hold. 

The unmodelled collective 0„(a) cannot be situated inside 
Pf(e, a), all we know is 

Quia) C Poc(e,a)UP„d(e,a) = U\Pf(e,a). 

Since this inclusion holds for the parameter a G M*^ and one 
camera with settings e e E, obviously the following is true 
if we consider a camera network J\f consisting of n cameras 
with settings e^, i = 1, . . . ,n: 

0„(a) C (U\Pf(ei,a)n...nU\Pf(e„,a)) 
= U\(Pf(ei,a)U...UPf(e„,a)) 

Note that this set is already a good approximation of the 
unmodelled collective if we considered the entire set U. 
However, as we only monitor the surveillance area §, we 
define the desired model M(a) of the unmodelled collective 
0„(a) as the intersection with the set §, i.e., 

Mia) = M(a,ei,...,e„) 

:=§n (u\(Pf(ei,a)U...UPf(e„,a))) (2) 

This is the basic model that can be used to calculate 

Formula (1). In the following, we will extend our setting 
to incorporate a time dependency and to cover for different 
types of obstacles. 

C. Adding Time and Obstacles 

So far, we have only considered a static scene to be 
analyzed. Motivated by moving objects, we extend our 
setting by introducing a time dependency to the process 
under surveillance. Therefore, we declare the time interval 
of interest / = [fo,f*], in which to denotes the moment 
the reference image is taken and corresponds to the last 
instant the surveillance area ought to be observed. Thus, the 
unmodelled collective 0„(a(f)), its probability distribution 
Viait)) and its approximation M(a(t)) C S as well as 
the set of critical points C(f) change in time t G /. As a 
simple extension of (1) we obtain the time dependend error 
functional 



J J [d{C(t},Ouiam-d{C{t),M(a{t), 

*0 o(t)6R'= 



ei,...,e„))| dV{a{t))dt 
(3) 
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In a second step, we add some more details to the scene 
under surveillance. To this end, we specify several categories 
and properties of objects O C S, which we are particularly 
interested in and which affect the reconstruction of the 
current scene. Right from the beginning we have considered 
unmodelled objects. In contrast to modelled objects, these 
objects need to be reconstructed in order to track them. In the 
following, we additionally distinguish objects based on the 
characteristical behavior "static/dynamic", "target/obstacle" 
and "rigid/nonrigid", neglecting those objects that cannot be 
noticed by the sensors (Uke a closed glass door for cameras 
without distance sensor). 

We define a target T C O of a sensor network as 
an object which ought to be monitored and in our case 
reconstructed. An object B C O which is not a target 
is called obstacle. Furthermore, we distinguish obstacles 
based on their physical character: An obstacle B features a 
rigid nature (like furniture), if the inpenetrability condition 
T O B^' = holds, and is denoted by the index r in B^. 

The method proposed in [9] constructs a visual hull 
of an object by background subtraction, i.e. via change 
detection. In context of change detection methods another 
characteristical behavior of objects is relevant: A static object 
is an object Og C O which is known to affect the given 
sensors in the same way at any time. If this is not the case, 
it is called dynamic, which we indicate by adding a subscript 
and a time dependency Od{t). More specifically, within the 
proposed background subtraction method the value of each 
pixel of a current image is subtracted from its counterpart 
within the reference image which has been taken beforehand. 
Thus, any change (like size/color/location) occuring after 
the reference image has been taken leaves a mark on the 
subtracted image, i.e., if the scene consists of static objects 
only, then the subtracted image is blank. For this reason, 
static objects must be placed in the scene before the reference 
picture is taken, and dynamic objects must not. 

Within the rest of this work, we consider all unmodelled 
objects to be reconstructed, i.e. in (3) we have 

Ou{a{t)) := T(a(i)) (4) 

Consequently, the unmodelled collective and its distance to 
the critical points are dynamic targets. Thus, we always 
consider an obstacle to be a modelled obstacle since all our 
unmodelled objects are targets. Furthermore, all obstacles 
are considered rigid. To formalize the human-robot-scene let 
Bl C O and Bj^{t) C O be the collective of static and 
of dynamic obstacles with time t e [toj**]' respectively. 
We incorporate these new aspects into the model of the 
unmodelled collective in (3) by intuitively extending our 
notation to 

M(a(i), ei, . . . , e„) := M(a(<), ei, . . . , e„, ^^(t)). 

(5) 

Last - as a robot is a dynamic obstacle in addition to a 
security thread (f.e. when moving too fast) - we define the 



critical points in (3) as the collective of dynamic obstacles 

C{t):=Bm- (6) 

Note that there are dynamic obstacles next to dynamic 
targets i.e. the unmodelled collective. Thus a dynamical 
obstacle could easily be regarded as an object of the un- 
modelled collective since both evoke akin reactions of the 
change detection method. In our approach the obstacles are 
fully modelled and thus define a target free zone since they 
are physical obstacles. Still, inaccuracies of the acciden- 
tal change detection leave fragments outside the dynamic 
obstacle, in our case outside the critical points. As a con- 
sequence the required distance between critical points and 
target is reduced to zero. Publication [9] solves this issue 
by introducing plausibility checks, in which predicates that 
characterize the target (like volume, height, etc.) are used to 
sort out the fragments. 

In conclusion, our aim is to solve the problem 

Minimize (3) 
using definitions (4), (5) and (6) 
subject to (ii, . . . , Cn € E 

i.e., to compute the optimal positions of n cameras with 
settings ei , . . . , e„ such that the measurement error is mi- 
nimized. 

IV. Aspects of Optimization 

There are various ways to compute Equation (3) referring 
to: Representing the model, solving the integral and solving 
the optimization, as can be seen further on. 

A. Discretization of time and distribution 

We would at first like to state that the distance 
(i(C, M(a,...3)) between the model and another set does 
not need to be continuous at every appearance a even if 
the distance (i(C, 0„(a)) to the unmodelled collective is 
continuous at a. This point can also be made for Equation (3) 
but we stick to Equation (1) for reasons of simplicity. Such 
a case is illustrated in Fig. 2. As the original unmodelled 
collective Ou(a) of the appearance a G M*^ does not 
necessarily need to be convex or even connected, given the 
settings ei,i = 1, . . . , n E E, the unfree parts of the sensors 
U\Pf(ej,a) do not need to be connected, either. The model 
is constructed of an intersection of these parts (see Equation 
(2)). But, as intersections of disconnected parts do not need 
to be continuous on a G M'^ (e.g. referring to Hausdorff- 
metrics), the distance d{C, M{a, . . .)) between the model 
and another set does not need to be continuous at every 
appearance a. 

Since only integrals with continuous integrants can gene- 
rally be calculated as a whole or else need to be splitted, 
such a discontinuous function becomes a problem when 
being an integrant as of Formula (1). In our case a point 
of discontinuity of the distance as a function of a cannot 
be derived easily, as it would have to be extracted from an 
individual nonrelated analysis depending not only on a or 
t but also on the sensor settings e,,z = l,...,n. While 







'J 




ir p. (f,,;il^yC" i 






















vy ij"p7(='i .4 













Fig. 2. Discontinuity of the distance between perilous points C and 
tlie approximated model, consisting of and intersection the nonfree 
part of camera 1, U\Pf (ei , a)(green), and the nonfree part of camera 
2, Pt(e2, a)(orange). 



in simple cases this is possible, we spare such an altering 
analysis by discretizing appearance and time. Here, just the 
I = 1, .... L most important appearances of the unmodelled 
collective a/ £ M*'' and h — 1, . . . ,H most important time 
steps th G [tQ,t^,] with to — ti and t^, = tn and with their 
weights 

iOi,h^V{ai{th)) e [0,1] 

are modelled. Accordingly, the following weighted sum 
approximates the integral of Formula (3): 



H L 



(V) 



d{C(th),0^{ai{th))) 
d{C{th),M{ai{th),ei, 



B. Discretizing space by voxels 

The next challenge - building an intersection of (free- 
form) solids - has claimed to be subject of discussion for 
more than a quater of a century and still is an issue of recent 
investigations. The publication [22] describes three main 
areas of solving this issue depending on their representation, 
each going with pros and cons. 

SoUds represented by polygonial meshs can be intersected 
by exact arithmetic and intervall computation, checking 
surface membership afterwards. The major concerns of this 
approach are robustness and efficiency (e.g., while intersec- 
ting two tangetially connected polyhedra/polygons inside-out 
facettes are computed). 

Approximate methods (e.g. applying exact methods to 
a rough mesh of solids and refine the result) exist for 
meshs, too. Robustness problems (constructing breaks in the 
boundary) are in this case compensated by time consuming 
perturbation methods or interdependent operations which 
prevent parallel computing. 

There are also techniques for solids transfered to image 
space (ray representation). While many of these mainly help 
rendering rather than evaluating the boundary, there are some 
that can be applied to intersection purposes (Layered Depth 
Image). Unfortunately, when computing these representati- 
ons back into meshs many geometric details are destroyed. 

Loosing geometric details is also the case for volumetric 
approaches. Converting surfaces with sharp corners and 
edges into volumetric data (like voxels) and not loosing data 
for reconstruction purpose is a challenging task even with 



oversampling. This also holds true for a voxel representation, 
but voxels on the other hand are easily obtained and robustly 
being checked by boolean operations. In addition to that 
we need a data structure, distances and volumes which are 
calculated easily, properties which are ensured for voxels. 
For these reasons our approach uses a voxel based model 
which is obtained by boolean operations on the free parts of 
the sensor. 

C. Optimization method 

After having evaluated existing solutions by plugging them 
in the objective function of a problem, the solver of an 
optimization problem is a strategy to improve solutions until 
an optimum of the objective function is reached. To choose a 
suitable solver for the specified problem, there are different 
characteristics of the objective function Err{ei, . . . , e„) that 
need to be considered. 

At first, we associate the cone of a camera subtracted 
from the surveillance area as the 'undetectable '-part of this 
camera, depending on the setting e of the camera. Remember 
that the undetectable area could be part of the model of 
the unmodelled collective. Now imagine the cone rotating 
in 'yaw '-direction continuously. One can easily see that the 
distance between any given point of the surveillance area and 
this cone is not convex in e (as an exception, the chosen point 
can be included in the 'undetectable' -part and the distance 
is therefore for all e). 

The second characteristic to be discussed is the discon- 
tinuity of Err{ei, . . . ,en) with respect to e. Due to the 
voxel based model distances are only evaluated to a finite set 
of points. When calculating the distances we need to jump 
from one point to the next even if settings are just altered 
gradually. Thus, the objective function is discontinuous and 
constant in between these discontinuities. Even if we used a 
non-voxelbased model, discontinuities would appear due to 
the intersections of disconnected parts mentioned in Section 
IV-A. 

The objective function's properties complicate the search 
for a suitable solver As elucidated in standard references 
on nonlinear optimization like [15], most algorithms take 
advantage of a characteristical behavior like convexity, dif- 
ferentiability or at least continuity which cannot be guaran- 
teed in our case. This applies to all determinisic solvers 
for nonlinear programs such as the Sequential Quadratic 
Programming, all kinds of local search algorithms (Downhill- 
Simplex, Bisection, Newton, Levenberg-Marquard etc.) and 
many others. Moreover, the problem cannot be transformed 
to a standard form of solvers like branch-and-bound, decom- 
positions, cutting planes or outer approximation. This leaves 
us with non-deterministic, e.g., stochastic solvers. We have 
chosen the method MIDACO which is based on the ant- 
colony algorithm and samples solutions randomly where they 
appear to be most promising, see [20,21] for details. 

D. Complexity 

The solver is an iteration which generates a tuple of 
settings ei, i = l,...,n (one setting for each camera) 



within each iteration step stochastically, based on knowledge 
of previous generations. Given these settings the model, 
the distances and the objective function consisting of the 
weighted sum given in Equation (8) are evaluated. This 
continues until a stopping criteria is fulfilled. In order to 
compute the complexity of the method, assume that upon 
termination the 7-th iteration step has been reached. The 
process of obtaining the objective value of Formula (8) is 
only implemented in a basic version, whereas for the given 
tuple of settings Cj aU of the H time steps and L appearances 
are to be evaluated to test all of the r voxels whether they 
are included in the intersection in question. The intersection 
test uses aU of the f^"'^ facets of the unmodelled collective 
as well as most of static facets and /J*"^ dynamic facets. 
Summing up these components give us the complexity 

0( I-r-{ nfs + H{n + L) fr^' + H L n f^"" } 
of the method. 

V. Experiments 

Since we use a stochastical solver on the non-convex 
problem of camera configuration, the obtained solutions 
(i.e. tuple of settings) most likely differ from one another 
although the same objective value (i.e. deviation of distances) 
might have been found. Therefore, we ran groups of 20 
solver calls with the same parameters to perceive the average 
outcome. One examination consists of a few groups of test 
runs which only differ in one parameter. We made exami- 
nations about changing resolutions, facets, objects, amount 
of events, amount of cameras and starting point. As long as 
there are no other assumptions the basic setup stated in Tab. 
I is used. 



modelled part of the 
scene 



dimension/amount 



surveillance area S: 
voxel resolution: 
critical points: 
static collective: 
dynamic collective: 
unmodelled collecti- 
ve: 

camera placement: 
starting solution: 

stop criteria: 



cuboid 4m X 3m X 3m 
(16 X 12 X 12) 

all point inside the dynamic collective 

8 facets at 2 objects 

24 facets at 6 objects in 2 timesteps 

24 facets at 6 objects in 3 events (of 

distrib.) 

6 cameras all over the surveillance area 
cameras are placed and orientated ran- 
domly all over S 
maximal time limit 3h 
optimization tolerance [diagon.o.voxel] 



TABLE I 

BASIC SETUP, WHICH IS USED IF NO OTHER ASSUPTIONS ARE MADE 



We additionally assumed that the dynamic collective is 
also considered to be the set of critical points. Thus, we 
were able to model a robot (dynamical object and critical 
points) spinning too fast in direction of a human (unmodelled 
object). 

Furthermore, Tab. II contains all test parameters and their 
ranges. The aim of this section is to summarize all the 
examinations defined in Tab. II and, in particular, to answer 
the following central questions: Can the desired optimization 



modelled part 

of the scene alterations 



voxel res.: 



static coU.: 



dynamic coll.: 



unmodelled 
coll.: 



restrictions of 
the settings' 
domain: 



(16 + 4i X 12 + 3i X 12 + 3i) for i = 
0,1,2,3,4,5 

8 + 60i facets for i = 0, . . . , 5 at 2 obj. 

6 -I- 4i facets at 2 -|- i objects i = 0, . . . , 3 

24 -f 60j facets i = 0, . . . , 5 at 3 obj./2 timest 

2 + i obj. i = 0, . . . , 3 w. 24 + Ai fac./l timeit. 
objects placed randomly 

j = 1, . . . , 5 timest. w. 2i obj Si fac. 3 events 
24 -|- 60i facets i = 0, . . . , 5 at 2 obj./3 events 

2 + i obj. i = 0, . . . , 3 w. 24 + 4i fac./l even 
objects placed randomly 

i = 1 , . . . , 5 events w. 2i obj 8i fac. 3 timest. 

j = 3, . . . ,9 

cameras placed only at 'ceilling' 
cameras placed only in the 'upper fourth' 



TABLE II 

THIS IS AN OVERVIEW OF ALL EXAMINATIONS. AN EXAMINATION 
CONSISTS OF A FEW GROUPS OF TEST RUNS, EACH GROUP DIFFERS 
ONLY IN ONE PARAMETER. 



tolerance be satisfied in time, ie. will the target be appro- 
ximated as accuratly as needed? How many iteration cycles 
are needed? What is the operating time of one cycle, of each 
iteration step and the components of one step? What is the 
highest memory consumption? 

A. Hardware and Software 

We implemented the optimization problem in C-n- and 
compiled it with 'gcc' version 4.0.20050901 (prerelease) 
optimized with the setting '-03' on SuSE Linux version 
10.0. We have used only one of the two cores of an AMD 
Opteron(tm) Processor 254 with 2.8 GHz Power(dynamical 
from IGHz - 2. 8GHz) .Further information can be taken from 
Tab. Ill and IV. 

B. Optimization tolerance 

As a second stopping criteria next to the three hour time 
limit we introduced the optimization tolerance, which is the 
maximal objective value a tuple of settings must be mapped 
at, for the optimization to terminate. This is desinged to 
depend on the length of a voxel's diagonal. In many cases the 
solver was able to satisfy the desired optimization tolerance 
in the predefined maximal time. Following exceptions have 
exceeded the time limit: We recorded an increasing time 
consumption of one iteration step (beyond linear) when 
gradually raising the resolution of the voxel discretization. 
Due to the time criterion, approaches with a resolutions of 
more than 24 x 18 x 18 were terminated before satisfying 



model name: 


AMD Opteron(tm) Processor 254 


12000 


cpu MHz: 


1004.631 


cache size: 


1024kB 


10000 


clflush size: 


64 




caclie_alignment: 


64 


8000 



TABLE III 

PART OF THE OUTPUT OF $: CAT /PROC/CPUINFO 



MemTotal: 


4038428kB 


MemFree: 


886856kB 


Buffers: 


431016 


Cached: 


2079360 


SwapCached: 


OkB 


Active: 


1431004kB 


Inactive: 


1138428kB 


SwapTotal: 


12586916kB 


SwapFree: 


12586916kB 



TABLE IV 

PART OF THE OUTPUT OF $: CAT /PROC/MEMINFO 



the optimization tolerance (see Fig. 3 and 5). Increasing the 
number of dynamic obstacles resulted in too many iteration 
steps (over 160000 at most compared to less than 45000 
when increasing the amount of static obstacles, cf. Fig. 4) 
and thus decreasing the amount of tests the optimization 
tolerance was satisfied for, in time, as illustrated in Fig. 6. 
In rare cases, a similar outcome was observed if theamount 
of randomly placed unmodelled objects is increased. 
A combination of both occurrances - the time loss in each 
iteration step and the requirement of too many iteration steps 
- has been observed for test runs utilizing a small number 
of cameras (considering three cameras it was literally im- 
possible to compute a satisfactory result, see Fig. 7). In case 
of the tests on dynamic obstacles and too few cameras, the 
model of the unmodelled collective could not be produced 
optimally before the maximal computing time was up. We 
experienced similar results for all tests concerning restrictive 
domains: None of the tests reached the optimization tolerance 
(0.046 ni^) but all of them stayed below the value 0.25 ni^. 
This could be a sign, e.g. that in our test setting six cameras 
on the ceilling cannot assimilate the unmodelled collective 
close enough by the model. 

C. Time consumption 

When raising the amount of events, time steps, facets or 
objects of any of the collectives we have also recorded a 
linearly increasing time consumption for one iteration step. 
Out of these, the resolution of voxel discretization and the 
amount of dynamic objects appear to be the most critical 
ones. Using more cameras, however, resulted in a lower time 
loss in one iteration step in our range of camera amounts 
(for three cameras we required about 315 ms on average 
whereas for nine cameras ca. 170 ms were needed). Of 
course, this effect can only last until optimization tolerance is 
satisfied (i.e. the model assimilates the unmodelled collective 
as accurat as needed), and hence time consumption will slope 
up when using a greater amount of cameras. 

Without giving a detailed explanation about the way one 
iteration step is calculated with our test setting's camera 
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Fig. 3. Scatter plot: With refined resolution the mean (light 
grey squares) of the amount of iteration steps (each represented 
in a dark grey sqare) in one group was higher. Columns: groups 
of 20 iterations with different resolutions; 
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Fig. 4. Scatter plot: Dynamic obstacles (and perilious points) 
are complicating the iteration. Columns: groups of 20 iterations 
of additional dynamic objects (red) and static objects (blue) 
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Fig. 5. Bar Chart: More refined 
resolution than (24 X 18 X 18) 
made it impossible to satisfy the 
predefined optimization tolerance 
in time 
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Fig. 6. Bar Chart: The more dyna- 
mic objects were spread across the 
surveillance area, the less test runs 
satisfied the desired optimization 
tolerance 



network, we would like to state that extending the amount 
of facets, cameras and refining the voxel resolution enlarges 
time consumption of the intersection test. However, no 
intersection test except for those with refined voxel resolution 
has exceded 15ms on average. The test runs with 36 x 27 x 27 
voxel, six cameras and 24 facets have reached an average 
of 50ms. After intersecting areas the related voxels need 
to be combined to clusters, as to be able to check a free 
part's height or volume (and to compare whether it could be 
human). This task took about twice up to four times as long 
as the intersection test, a fact which is mainly due to its direct 
dependence on the resolution, but also due to the misshaping 




3 4 5 6 7 

amount of cameras 

Fig. 7. Bar Chart: Five till 
nine cameras could contour the 
unmodelled collective best. 
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Fig. 8. Bar Chart: The to- 
tal time consumption of an incre- 
ased amount of cameras sloped 
down because the clustering (oran- 
ge) weighted more than the actual 
intersection test (blue). 



of the model (as the clustering seems to depend indirectly 
on the amount of cameras). As the period of an iteration step 
is mostly filled with intersecting and clustering. Fig. 8 also 
shows the decreasing time consumption while using more 
cameras. 



D. Memory 
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Fig. 9. Plot of increasing maximal virtual memory that 
was used when refining the resolution of voxels 

Measurements of the maximal virtual memory while al- 
tering the resulution resulted an ascending graph (beyond 
linear), cf. Fig. 9. The highest demand for virtual memory 
was measured while testing with the resolution 36 x 27 x 27 
(a total of 32868 kB). The graphs concerning the maximum 
demand for virtal memory versus facets and amount of 
cameras are only ascending slowly. Both show a linear slope 
of about 350 kB to 450 kB in our range of parameters. 

VI. Conclusion and Future Prospects 

We managed to build up a camera placement optimization 
algorithm that computes location and orientation of a given 
amount of cameras inside of a specified surveillance area. 
Only randomly placed dynamic obstacles, too few cameras 
or too restricted placements and a too refined voxel resolution 
are a critical for this method. Apart from that we have suc- 
ceeded to minimize the error made by evaluating distances 
to the visual hull of a given object up to the optimization 
tolerance. In contrast to existing results, we are able to model 
a surrounding area with static and moving obstacles without 
limiting camera positions or orientations and still evaluate 
distances conservatively. 



Still, as to assimilate the model and the unknonwn collec- 
tive even better, higher resolutions are desired. This leads to 
the fact that some improvements of the algorithm still need 
to be implemented. Following alterations of the algorithm 
may lead to an improved time consumption: First of all, it is 
possible to parallelize the iterations of the solver as well as 
some intersection tests. But as the amount of iteration steps 
of the solver ranged in between about 500 and 160000, the 
first goal should be to decrease both the expected number of 
iteration steps as well as their variance. Placing the initial 
position of the cameras roughly around the surveillance area 
and leaving the fine tuning to the algorithm could do the 
trick. 

Some consideration should also be paid to save many clus- 
tering and intersecting processes by leaving out unnecessary 
caculations. One of these calculations is the summing up 
L ■ H addends (the number of appearances times number of 
time steps), which all have to be simulated. Time loss will 
be minimized if cancelling the evaluation of the sum when 
it trespasses the current optimal value. Also, appropriate 
data structures like Oct-Trees and BSP-Trees for intersection 
and inclusion tests have not been implemented, yet, which 
improve the time loss during the intersection test. 
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